Flights Dataset

by (Nawaf Alyousef)

Table of Contents:

Introduction

These dataset are information about all aspects of the flight, such as departure time, arrival time, departure airport, and arrival airport.

Data Wrangling

Gathring

Assessing

Quality issues :

Cleaning

What is the structure of your dataset?

We have 22 columns 1-Year ,2-Month ,3-DayofMonth
4-DayOfWeek ,5- DepTime ,6-CRSDepTime ,7- ArrTime
,8- CRSArrTime ,9- UniqueCarrier ,10- FlightNum ,11-TailNum
,12-ActualElapsedTime ,13- CRSElapsedTime ,14- AirTime
,15- ArrDelay ,16- DepDelay ,17- Origin ,18- Dest ,19- Distance
20- TaxiIn ,21- TaxiOut ,22- Cancelled , 23- Diverted ,RangeIndex: 5683047 entries.

What is/are the main feature(s) of interest in your dataset?

I intrest in some column such as FlightNum,Month,DayOfWeek,Distance,ArrDelay,DepDelay,Cancelled,UniqueCarrier.

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

I think the features that wil help my in support my investigation is FlightNum,Distance,Cancelled,UniqueCarrier,Month,DayOfWeek,Distance,ArrDelay,DepDelay.

Univariate Exploration

What are the days have a large number of Flights?

We notice from the diagram all a days have a same a number of Flights, but we notice that the sixth day decreased the number of flights from the rest of the days.

What are the months have a large number of Flights?

We can see from the drawing the increase in the number of trips in month 3 , month 8 from the rest of the months of the year.

Is the number of canceled flights high or low?

We note from the drawing that the number of canceled flights is very low

What are the most common type of Unique Carrier?

We notice from the drawing the most common type of carrier is a WN , DL.

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

Distributions were normal. We discussed in the distributions these topics Number of Flights per Days, Number of Flights per month,The rate of cancelled for all Flight , The Unique Carrier Yes, I needed to convert a Cancelled column to boolean.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Distributions were normal , No.

Bivariate Exploration

What is the months that have a largest number of Flights ?

We note from the drawing that we want to find out what is the relationship of the months with the largest number of Flights, and we noticed that from month 7 to the end of the year they had the largest number of Flights.

What is a Unique Carrier that have a Max Distance and What is a Unique Carrier that have a Min Distance?

We note from the drawing that we want to find out which UniqueCarrier have a Max Distance and a Min Distance and the result is the UniqueCarrier have a Max Distance is CO and UniqueCarrier have a Min Distance is AA.

Which months that have a high number of cancellation?

We notice through the diagram that we want to discover the relationship between the cancellation rate in each month and in any month the cancellation rate is higher than for the rest of the months., and we noticed that in 1 month and 12 month, the cancellation rate was higher than for the rest of the months.

Which Unique Carrier that have a high number of cancellation?

We notice through the diagram that we want to discover the relationship between the cancellation rate of the unique carrier, and we have noticed that the carrier UA had the highest rate of cancellation, and we also note that the carrier US , DL, AA had the same cancellation rate.

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

We discussed in the relationships these topics: Max Number of Flight Vs Months Which’s has a strong relationships , Max Distance Vs UniqueCarrier and Min Distance Vs UniqueCarrier Which’s has a strong relationships ,Cancelled Vs UniqueCarrier Which’s has a middle relationships, Cancelled Vs Months Which’s has a middle relationships.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Yes, The Max Number of Flight Vs Months Which’s has a strong relationships , and The Max Distance Vs UniqueCarrier and Min Distance Vs UniqueCarrier Which’s has a strong relationships.

Multivariate Exploration

What is the relationship between UniqueCarrier's FlightNum & Airport_Dest & Cancelled ?

We note through the graph, which reveals UniqueCarrier's relationship with FlightNum and also with Airport_Dest, Cancelled , so we notice that in Airport_Dest ALT was DL Carrier was have large of number of flights from the rest of the Airport with minim of Cancelled , and we notice in Airport_Dest ORD the AA,UA UniqueCarrier's was have the same number of flights with same Cancelled.

Which Month have Max Number of flights of UniqueCarrier ?

In this visualization, which reveals UniqueCarrier's relationship with FlightNum and also with Month we want to find out Which Month have Max Number of flights, so we notice from month one to month three the US UniqueCarrier was have a max Number of flights , after that the Number of flights was decreased , and the average of Number of flights for all UniqueCarrier's is 1150 flights and WN UniqueCarrier have a min Number of flights from the rest of the UniqueCarrier. .

What is the relationship between UniqueCarrier's , Month , DayOfWeek ,FlightNum ?

We notice through the drawing that shows the relationship of UniqueCarrier's , Month , Days,FlightNum ,we note the number of flights in DL UniqueCarrier are greater than US UniqueCarrier , and we note in both US ,DL UniqueCarrier's the tall duration of max number of flights is in mothe five from day one to day five also in mothe 12 from day five to day seven.

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

In UniqueCarrier , FlightNum , Dest ,Cancelled ,I notice that in Airport_Dest ALT was DL Carrier was have large of number of flights ,and I notice in Airport_Dest ORD the AA,UA UniqueCarrier's was have the same number of flights with same Cancelled. In UniqueCarrier , FlightNum , Month, I notice from month one to month three the US UniqueCarrier was have a max Number of flights , also I notice in WN UniqueCarrier have a min Number of flights from the rest of the UniqueCarrier. .

Were there any interesting or surprising interactions between features?

No

Sources